83 research outputs found

    Grounding spatial prepositions for video search

    Get PDF
    Spatial language video retrieval is an important real-world problem that forms a test bed for evaluating semantic structures for natural language descriptions of motion on naturalistic data. Video search by natural language query requires that linguistic input be converted into structures that operate on video in order to find clips that match a query. This paper describes a framework for grounding the meaning of spatial prepositions in video. We present a library of features that can be used to automatically classify a video clip based on whether it matches a natural language query. To evaluate these features, we collected a corpus of natural language descriptions about the motion of people in video clips. We characterize the language used in the corpus, and use it to train and test models for the meanings of the spatial prepositions "to," "across," "through," "out," "along," "towards," and "around." The classifiers can be used to build a spatial language video retrieval system that finds clips matching queries such as "across the kitchen."United States. Office of Naval Research (MURI N00014-07-1-0749

    A System for Generalized 3D Multi-Object Search

    Full text link
    Searching for objects is a fundamental skill for robots. As such, we expect object search to eventually become an off-the-shelf capability for robots, similar to e.g., object detection and SLAM. In contrast, however, no system for 3D object search exists that generalizes across real robots and environments. In this paper, building upon a recent theoretical framework that exploited the octree structure for representing belief in 3D, we present GenMOS (Generalized Multi-Object Search), the first general-purpose system for multi-object search (MOS) in a 3D region that is robot-independent and environment-agnostic. GenMOS takes as input point cloud observations of the local region, object detection results, and localization of the robot's view pose, and outputs a 6D viewpoint to move to through online planning. In particular, GenMOS uses point cloud observations in three ways: (1) to simulate occlusion; (2) to inform occupancy and initialize octree belief; and (3) to sample a belief-dependent graph of view positions that avoid obstacles. We evaluate our system both in simulation and on two real robot platforms. Our system enables, for example, a Boston Dynamics Spot robot to find a toy cat hidden underneath a couch in under one minute. We further integrate 3D local search with 2D global search to handle larger areas, demonstrating the resulting system in a 25m2^2 lobby area.Comment: 8 pages, 9 figures, 1 table. IEEE Conference on Robotics and Automation (ICRA) 202

    Grounding language in spatial routines

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2006.Includes bibliographical references (p. 105-108).This thesis describes a spatial language understanding system based on a lexicon of words defined in terms of spatial routines. A spatial routine is a script composed from a set of primitive operations on sensor data, analogous to Ullman's visual routines. By finding a set of primitives that underlie natural spatial language, the meaning of spatial terms can be succinctly expressed in a way that can be used to obey natural language commands. This hypothesis is tested by using spatial routines to build a natural language interface to a real time strategy game, in which a player controls an army of units in a battle. The system understands the meaning of context-dependent natural language commands such as "Run back!" and "Move the marines on top above the fiamethrowers on the bottom." In evaluation, the system successfully interpreted a range of spatial commands not seen during implementation, and exceeded the performance of a baseline system. Beyond real-time strategy games, spatial routines may provide the basis for interpreting spatial language in a broad range of physically situated language understanding systems, such as mobile robots or other computer game genres.by Stefanie Tellex.S.M
    • …
    corecore